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Abstract 

Long-lived (>20 days) sunspot groups extracted from the Greenwich Photo- 
heliographic Results (GPR) are examined for evidence of decadal change. The 
problem of identifying sunspot groups which are observed on consecutive solar 
rotations (recurrent sunspot groups) is tackled by first constructing manually an 
example dataset of recurrent sunspot groups and then using machine learning 
to generalise this subset to the whole GPR. The resulting dataset of recurrent 
sunspot groups is verified against previous work by A. Maunder and other Royal 
Greenwich Observatory (RGO) compilers. Recurrent groups arc found to exhibit 
a slightly larger value for the Gnevyshev-Waldmeier Relationship than the value 
found by Petrovay and van Driel-Gcsztelyi (Solar Phys. 51, 25, 1997), who 
used recurrence data from the Debrecen Photoheliographic Results. Evidence 
for sunspot group lifetime change over the previous century is observed within 
recurrent groups. A lifetime increase of 1.4 between 1915 and 1940 is found, 
which closely agrees with results from Blanter et al. (Solar Phys. 237, 329, 
2006). Furthermore, this increase is found to exist over a longer period (1915 to 
1950) than previously thought and provisional evidence is found for a decline 
between 1950 and 1965. Possible applications of machine-learning procedures to 
the analysis of historical sunspot observations, the determination of the magnetic 
topology of the solar corona and the incidence of severe space-weather events 
are outlined briefly. 

Keywords: Sunspots, neural networks, long-term change, non-linear, lifetime, 
Greenwich, sunspot nests, sunspot nestlet, 



1. Introduction 

The influence of the Sun on the Earth has attracted renewed attention in the 
context of climate change (Friis-Christensen and Svensmark, 1997). Sunspots 



1 Centre for Fusion, Space and Astrophysics, University of 
Warwick, UK 

2 UK Solar System Data Centre, Rutherford Applcton 
Laboratory, Chilton, Didcot, Oxfordshire OX11 0QX, UK 
email: Richard.Henwood@stfc.ac.uk 

3 Space Science and Technology Department, Rutherford 
Appleton Laboratory, Chilton, Didcot, Oxfordshire OX11 
OQX, UK 



Hcnwood et al. 



are a good measure of solar activity and have been observed systematically for 
hundreds of years. The Royal Greenwich Observatory (RGO) began an effort 
to record the position and size of sunspot groups in 1874 and maintained this 
programme of solar observations until the end of 1976. The resulting dataset is 
unrivalled in its longevity and homogeneity. 

Several attempts to model the total quantity of solar radiation arriving at the 
Earth (the total solar irradiance) have been undertaken using various indices 
(Lean, Beer, and Bradley, 1995; Fligge and Solanki, 1997; Balmaceda, Krivova, 
and Solanki, 2007). Some of these attempts have relied on the measurements 
of sunspot number, since this index extends back for a few centuries. Modern 
high resolution imaging and measurement of sunspot properties are of limited 
use because the characteristic times of solar change, on top of the 22-year solar 
cycle, are expected to take place on centennial time scales (Blanter et al., 2006). 

Studies of the temporal properties of sunspot groups (lifetime, maximum 
size, heliographic position, etc) are hampered by two factors: short-lived sunspot 
groups may be missed due to nightfall (Solanki, 2003) and the rotation of the 
Sun carries groups out of view from an Earth-bound observer. In addition, the 
effects of fore-shortening and limb darkening will hamper reliable observation 
away from the central meridian (Pierce and Slaughter, 1977). 

In order to quantify the limits of reliable observation of sunspot groups within 
the GPR, the longitude distribution of the apparent maximum size is presented 
in Figure 1 . From this illustration one can conservatively conclude that observa- 
tions at distances greater than 60° from the central meridian are difficult. This 
result is consistent with theoretical findings (Kopccky, 1985). 

Sunspot groups that are recorded at any stage with a central meridian distance 
<-60° or >+60° are classified as 'unreliably observed'. This subset is named 
GPR(unrcliable) herein. 

The considerable asymmetry of the distribution in Figure 1 may be explained 
by a combination of sunspot group decay rate and recurrence. This matter is 
briefly treated by Henwood (2008). However, a detailed investigation is outside 
the intended scope of this paper. 

Blanter et al. (2006), performed a non-linear study of short-term correla- 
tion properties of solar activity in order to reveal long-lifetime variations. This 
method was applied to the GPR and an increase of lifetime by a factor of 1.4 
was observed from 1915 to 1940. A dataset of sunspot group lifetimes that are 
not truncated by solar rotation would allow direct measurement of lifetime and 
hence verify the observation of Blanter et al.. 

In this study a training dataset of recurrent sunspot groups is constructed by 
hand from longitude-time plots of GPR data. This subset of GPR data is bal- 
anced to ensure an equal number of linked and unlinked examples, and presented 
as training input to a number of feed-forward neural network architectures. Each 
network is trained using 10-fold cross validation and the network that displays 
the lowest over or under fitting is selected. The chosen network is then trained 
and applied to the entire GPR dataset. Finally, a simple filter is applied to the 
linked dataset and validation is performed. 
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Figure 1. The longitudinal distance from the central meridian of the Sun at which a sunspot 
group achieves maximum wholespot size. Error limits are the width of the bins (5°) in the 
x-direction and the square-root of the number of elements in the y-direction. GPR data are 
filtered to remove sunspot groups with only one observation, since these do not make mean- 
ingful maximum-size contributions. The resulting GPR data contain a maximum at around 
-75°. This is attributed to long-lived spots which are growing as they reach the west limb 
of the Sun or declining as they appear over the east limb of the Sun. The \ 2 value for this 
distribution gives a high statistical significance (>99%) to the departure from uniformity. The 
declining measurements within the grey regions indicate that reliable observations towards the 
solar limbs are difficult. 



2. Method 

A meticulous observer can follow a long-lived sunspot group for successive days 
until it disappears from view over the west limb of the Sun. By taking note of 
the latitude of the last reliable observation, and with knowledge of the solar 
rotation period, the observer could wait to see if a sunspot group appeared over 
the east limb at roughly the same latitude and at the appropriate time. If such 
a prediction is fulfilled, one may conclude that the same sunspot group has been 
observed on consecutive rotations and that it should not be recorded as two 
separate sunspot groups. Ideally, a recurrent sunspot group should be identified 
as such, possibly by using the same sunspot group number for such recurrent 
observations. Within the digital version of the GPR, no recurrent information 
is recorded. Sunspot observations are grouped together and allocated a unique 
Greenwich group number if the group is observed on consecutive days. 

Becker (1955) studied Sonnenfleckenherde, which may be translated as 'focus 
of sunspots'. The method employed to identify recurrence in that study was a 
"statistical method" . Sonnenfleckenherde were defined as "an area on the Sun 
in which, during a longer period of time, spots appear or arc being built" . Size, 
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Figure 2. A longitudc-timc diagram of GPR(unrcliablc) groups for the last six months of 
1935. Time is in the x-direction, longitude in the y-direction. GPR(unrcliable) groups are 
displayed as a connected series of coloured circles. Colour corresponds to latitude (right-hand 
colour bar); radius indicates size, corrected for foreshortening and measured in millionths of 
the solar hemisphere. Example radii arc provided for sunspot sizes of 1000, 2000, 3000 and 4000 
millionths of the solar hemisphere (left of colour bar). Slanting grey dots mark the Carrington 
longitude of the east and west limb of the Sun. 

hcliographic latitude and longitude are all taken into consideration during the 
search for recurrent groups. This method was applied to GPR data from 1879 
to 1941 and 46 Sonnenfleckenherde were catalogued. 

Subsequently, Castcnmillcr, Zwaan, and van dcr Zalm (1986) introduced the 
term sunspot nest to describe the same phenomenon on the Sun. They de- 
fined nests as evolving within one month, lasting for 6 to 15 rotations and 
not expanding in latitude or longitude. In their study, the primary means of 
investigating sunspot nests was by visualising the GPR in heliographic longitude- 
time diagrams (Figure 2) and recording recurrent spots manually. Practically, 
Castcnmillcr, Zwaan, and van dcr Zalm (1986) required the following data for 
their analysis: Carrington longitude, latitude and number of "visible days" a 
group was observed during one solar rotation. This method was applied to the 
period August 1959 - December 1964 of GPR data, and 41 probable sunspot 
nests were found. 

Within this paper a less stringent concept of a sunspot nest is employed, 
namely a sunspot nestlet. A nestlet is defined as two or more 'unreliably' observed 
sunspot groups which are linked together because they are likely to be the same 
group but have different group numbers in the GPR. Hence a nestlet physically 
corresponds to a white-light sunspot group observed on consecutive solar rota- 
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tions. It is possible Nestlets are found preferentially in "active longitudes" or 
"hot spots" zones, but such a study is outside the intended scope of this paper. 
The method employed to find all the nestlets within the GPR is as follows: 

1. Construct a longitude-time diagram for a chosen solar cycle. 

2. By inspection, create a list of recurrent sunspot groups. 

3. Use this list to train a neural network. 

4. Apply the trained neural network to the whole GPR dataset. 

5. Post-process to remove outliers. 

2.1. Solar Differential Rotation 

Sunspot groups move on the photosphere with different speeds depending on 
latitude because of differential solar rotation (Schroter, 1985; Thompson et ai, 
1996). According to the annals published by the Royal Greenwich Observatory, 
"it should be noted that longitudes are based on the ephemeris given in the 
Astronomical Ephemeris, assuming a solar rotation period constant at all lat- 
itudes" (Royal Greenwich Observatory, 1980). Hence, after an interval of one 
rotation, recurring groups would be expected to show a drift in longitude due to 
differential solar rotation. Using the analysis of solar rotation from GPR data, 
performed by Balthasar, Vazquez, and Wohl (1986), one should expect a group 
at an extreme latitude of 35° to exhibit a backwards drift of at most l°day _1 
with respect to the equator. This would correspond to a drift of not more than 
18° during the whole unreliable passage (18 days). The figure of 18 days is 
derived using a synodic solar rotation period of 27 days and an unseen passage 
of 180° (the far-side of the Sun) + the unreliable regions of the near side of the 
Sun (2 x 30°). This gives: § x 27 days = 18 days. 

Figure 4 demonstrates that sunspot groups rarely move more than 5° in lati- 
tude and 15° in longitude over the duration of their observed life. Such movement 
appears to be independent of solar cycle development. Hence the worst case of 
18° longitude drift due to differential rotation is uncommon. 

2.2. Presenting GPR Data to the Neural Network 

The purpose of the proceeding approximate analysis of sunspot movement is 
to provide boundary conditions when converting GPR(unreliable) data into a 
form suitable for the neural network. This process consists of two stages. First, 
selecting groups that arc possibly linked. Second, reducing each possible linkage 
into a suitable network input. 

For the first stage, the group movement metrics calculated above provide 
approximate boundary conditions when considering if any two sunspot groups 
could be linked. It is asserted here that for such a link to exist from an initial 
group, the linked group (post group) must appear within latitude and longitude 
bounds of ±15° and ±50° respectively. These bounds are at the extreme end of 
observed group movement against an average solar rotation (Section 2.1). 

In addition, a post group must appear in the future (with respect to the 
initial group) and within a time window that starts when the first longitude 
bound appears at the east limb of the Sun, and ends when the last longitude 
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bound appears at the east limb of the Sun. These bounds are calculated using 
an ephemeris (Meeus, 1991). A deliberately generous window is chosen because 
decision making regarding recurrence is performed by the neural network, not 
during pre-processing of the data. 

Every group which is ever observed near the unreliable west limb of the Sun 
has this first stage applied to it and a set of potential linked candidates are 
created. Once a pair of potential linked candidates have been identified, they 
are encoded for presentation to the neural network. In the first step, the two 
groups which make up the linking candidates are classified as an "initial group" 
and a "post group" . The "initial group" is the one that is observed first in time 
and disappears over the west limb of the Sun. The "post group" is the one that 
is observed later in time, and emerges over the east limb of the Sun. 

The encoding of initial and post groups takes place as follows. The observation 
closest to the west limb is found for the initial group. For each observation 
in time, sunspot groups are quantised into a given number of bins for both 
latitude and longitude. Longitude and latitude are encoded separately. For the 
initial group, five latitude bins and seven longitude bins are used. The first 
observation of the group is placed in the middle bin (1) with the other bins 
at this time step set to zero (Figure 3, line 11 for longitude and line 27 for 
latitude). Tracking back from the west limb of the Sun, time steps are assumed 
to be 180/15 °/day (Figure 3, lines 11-17 and lines 27-33). This time step 
corresponds to the typical observed movement of a group due to solar rotation. 
If no group is observed during a particular time step, all the bins are set to zero 
(0). Subsequent group observations are placed in bins relative to the previous 
observation of the group. Bin widths are 1/2° and 1/7° for longitude and latitude 
respectively. This process is repeated for seven time steps and is illustrated for 
group numbers 8929 and 8965 in Figure 3. Uncoded groups are visualised on the 
left, with the corresponding network encoding on the right. 

The process for treatment of the post group is similar. For the post group, 
9 latitude bins and 21 longitude bins are used (Figure 3, lines 19-25 and lines 
3-9). Again, seven observations in time arc made. In the case of the post group, 
however, all binning is performed relative to the last observed longitude of the 
initial group. Extreme deviations are limited at the bin furthest from the middle 
bin. 

2.3. Constructing Training Data 

In this study, the judgment of the first author (Henwood, 2008) was initially used 
to decide if any two sunspot groups were close enough to represent a recurrent 
group. The dataset was balanced and a feed-forward neural network system was 
then trained with these judgments. In this context, a balanced dataset is one 
with an equal number of positive and negative examples. The system generalises 
from the examples presented to it and captures the uncertainty of the task. 
When a previously unseen case of possible recurrence is introduced, the trained 
neural network provides a probability that this is indeed a case of recurrence. 

Solar cycle 15 (August 1913 - August 1923) has been selected as the period 
to identify recurrence manually, since this cycle is neither the longest nor the 
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Figure 3. Schematic example of encoding a link between two groups a (initial) and b (post). 
"Zeros" and "ones" signify the absence and presence (respectively) of a sunspot group in a 
longitude or latitude bin. The moderate variation in longitude is visible in the encoded data 
(lines 11 — 17). The post link group is encoded with observations nearest the limb ranked 
highest. Thus the ninth line of the data representation encodes the longitude of group b 
observed closest to the east limb. The line of "ones" to the right of centre indicates that 
this group appears at a greater longitude than the initial group. Similarly, latitude indicated 
by colour is encoded in lines 27 - 33 for group a and lines 19-25 for group b. Line 37 encodes the 
judgement on whether or not the two groups shown are the same recurrent group; accordingly, 
'zero' is false, 'one' is true. Line 35 includes the Greenwich group numbers of the two groups. 
This is not used in decision making, it is used to identify groups after the neural network 
decision making process is complete. 



shortest. It can also be described as a period when the overall sunspot group 
number is neither particularly large nor small. In short, this cycle is chosen 
because it is "typical" . 

Only sunspot groups in the GPR(unrcliablc) dataset are considered for recur- 
rence. Pairs of groups which appear to be linked by recurrence are selected for the 
entire ten years of GPR(training). The training data, GPR(training), consists of 
4073 examples, 621 of which are "linked" or ■probability = 1, the remaining are 
"unlinked" or probability = 0. Such a dataset, which contains unequal numbers 
of true and false examples, is called "unbalanced". Provost (2000) highlights the 
problems associated with using unbalanced data with standard machine learning 
algorithms. Fawcett (2004) suggests the use of receiver operating characteristic 
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Figure 4. Extreme deviation is measured for all sunspot groups within the GPR that are 
observed to exist for more than ten days. The extreme deviation is measured as the difference 
between the absolute maximum and minimum recorded heliographic positions ("maximum 
deviation", x-direction) . Deviation counts are binned into solar cycle bins. Each solar cycle 
is divided into nine bins ("solar cycle bin", y-direction). Deviation counts are normalised 
("normalised occurrence", z-direction). The extreme latitude deviation (top) illustrates that 
groups commonly move a few degrees but rarely more than five degrees. Extreme longitude 
(bottom) deviations indicate a greater spread but with groups rarely moving more than 15°. 

(ROC) curves to evaluate a classifier trained on unbalanced data, while Lawrence 
et al. (1998) provide a number of suggestions in order to balance unbalanced 
data. 

In this study, the training dataset is balanced using two techniques: firstly, 
the addition of random noise (switching a single random adjacent bit in the 
network representation) and secondly a technique based on a priori knowledge 
of the problem domain. This second technique uses the following assumption: 
If one accepts that a given pair of groups are linked, any pair which have a 
smaller unseen latitude and/or longitude deviation must also be linked. Hence, 
new linked groups can be created from existing groups by re-encoding a linked 
pair of groups with a reduced sensitivity than described in Section 2.2. Using 
these two techniques, the training data is balanced to produce 3756 positive 
examples and 3827 negative examples. 

Ten-fold cross validation (Kohavi, 1995) has been used during training to pro- 
vide the learning and error rates. These are measured to identify network designs 
that exhibit overfitting or undcrfitting (Moore, 2001). Ovcrfitting is detected 
by observing that the error rate does not increase as learning is taking place. 
Under-fitting is detected by observing that learning has approached a stable 
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minimum. Six network designs were constructed with varying numbers of hidden 
layers and interconnects. These designs are documented in the Master's Thesis of 
Henwood (2008), which also provides an assessment of the performance of each 
network. The trained network that exhibits the least overhtting or underfitting 
is presented with the GPR(unreliable) dataset and, after classification, returns 
a new dataset with recurrent sunspots grouped and labelled with the group 
number of the first observed sunspot group. This dataset is called GPR(linked') 
and contains 5374 group linkages. 

A comparison of GPR(training) with GPR(linked') was then performed by 
hand. This revealed 20 or so linked groups identified by the neural network that 
exhibited a large deviation of latitude or longitude during unseen passage. We 
judged these to be physically unlikely but since there is no absolute criterion 
for this classification these could in principle be identified as linked groups by 
another observer. We have chosen a criterion with which to filter the data. 

A filter was constructed which removed linked groups that exhibited a large 
latitude deviation (> 8.5°) or the post link was not observed near to the East 
limb, which corresponds to an large (> 19.5°) longitude deviation. This filter 
removed 17 linkages (3%) from GPR(training) and 244 (5%) linkages from 
GPR(linked'). This suggests that the majority of the "physically unlikely" groups 
were not a result of overfitting or underfitting but are the result of subjectivity 
during selection of linked groups for the training dataset GPR(training). 

A final dataset, GPR(linkcd) was produced by applying the filter described 
above to GPR(linkcd'). The result is a dataset with contentious linkages re- 
moved. This dataset is used throughout the remainder of this study and is 
available at the UK Solar System Data Centre (http://www.ukssdc.ac.uk/wdccl/ 
greenwich/recurrence) 

3. Comparison with Manual Datasets 

The UK Solar System Data Centre (http://www.ukssdc.ac.uk) maintains a com- 
plete set of annals that were published by the Royal Greenwich Observatory; 
these provided the source for the GPR. As an appendix to the Greenwich ob- 
servations, a "Catalogue of Recurrent Groups of Sun Spots, 1874 - 1906" was 
compiled by A.S.D. Maunder and published in 1909. Between 1916 and 1955, 
the "Ledgers of Groups of Sunspots" included two sections: "Recurrent" and 
"Non- Recurrent" . Recurrent groups were also identified between 1907 and 1915 
but tabulation of the information during that time was different. 

While a digital version of these records is not known to exist, the method used 
to compile recurrent spots is documented (in the Greenwich Photoheliographic 
Results, 1956): 

Recurrent groups were selected upon the following plan, reference being made 
to the General Catalogue:- If any spot when first seen was 60° or more to the 
east of the central meridian, the catalogue and, if necessary, the Daily Results 
also, were searched some fifteen to sixteen days earlier to ascertain whether 
a spot group of similar heliographic longitude and latitude was then near the 
west limb of the Sun. Similarly, if any spot group when last seen was 60° or 
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Table 1. Recurrent sunspot groups found by Henwood (2008), termed GPR(training), 
and two other recurrent group datasets tabulated within the Royal Greenwich Observatory 
publications are compared with GPR(linked). The columns of values, from left to right 
are: the number of linked groups which the human and neural network classifier agree on 
(True Positive count); the number of linkages which both the human and neural network 
agree are not linked (True Negative count); the number of links which the neural network 
classifies as true but the human does not (False Positive count); the number of linkages the 
human thinks are true but the neural network does not (False Negative count). 



Dataset 


True Positives 


True Negatives 


False Positives 


False Negatives 


GPR(training) 


450 


3799 


34 


170 


RGO 1958 


50 


2032 


106 


11 


RGO 1896 


15 


180 


13 


11 



more to the west of the central meridian, a search was made fifteen to sixteen 
days later. When there appeared to be a case of probable continuity between 
groups in consecutive rotations of the Sun, the character of the groups, their 
areas and their longitude and latitude have been carefully compared before 
accepting them as recurrent groups. 

Between 1874 and 1906, Maunder catalogued 624 recurring groups: 468 were 
seen only in two rotations, 118 appeared in three rotations, 25 in four rotations, 
12 in five rotations, and 1 (somewhat doubtfully) in six rotations. 

The process of verification addresses the question of whether or not the work 
described in this paper has been completed correctly. One approach is to compare 
GPR(linked) with the recurrence observations that were published by the Royal 
Greenwich Observatory. Recurrence data for 1896 and 1958 were typed in and 
compared with GPR(linkcd). For interest, the performance of the neural network 
against the training data, GPR(training), was also evaluated. 

Table 1 compares these three different linked sunspot group datasets with 
GPR(linked) using metrics for a confusion matrix (Kohavi and Provost, 1998). 
True positive counts are links which are in both the manual and GPR(linkcd) 
datasets. False positives arc links which are only in GPR(linked). False negatives 
are links which are in the manual dataset but not in GPR(linkcd). True negatives 
are defined as: TN = Total -TP- FP- FN. 

By rapidly constructing longitude-time diagrams that correspond to the pe- 
riod of interest the instances of false classification have been investigated individ- 
ually. As may be expected, GPR(training) data performs the best overall. It does 
not achieve 100% success compared to the human classification. On inspection 
of the "False Positive" classifications made by the neural network, all but one of 
these linkages were discovered to be physically likely linkages which the human 
classifier had overlooked. During inspection by longitude-time diagrams it was 
observed that these "False Positive" results occurred in dense complexes of many 
groups. The neural network out-performs the human under these conditions. 

The Royal Greenwich Observatory 1958 recurrence records show the highest 
number of false positives. On investigating the set of examples classified as false 
positives, it was observed that these were again a result of a complication arising 



S0LA_1023.tex; 25/07/2009; 13:22; p. 10 



Lifetime of Recurrent Sunspot Groups 



during observation of longitude- time diagrams. It should be noted that 1958 
was a year during which the sunspot number reached a record high. This level 
of activity corresponds to a greater number of observations to display on a 
longitude-time diagram, which increases the challenge to a human of identifying 
all potentially linked groups. 

In addition, the method used during 1958 was restricted to providing only 
one link between two given groups. However, the neural network classifier could 
find multiple linkages between groups. Finally, the neural network classifier could 
also make a link between groups even if one of the groups was only visible for a 
single day. Such a linkage is apparently not permitted within the 1958 recurrence 
dataset. 

The RGO 1896 recurrence records showed the greatest discrepancy from the 
neural network linked dataset. This is probably because the criteria used to iden- 
tify a recurrent group are different from both the 1958 and the GPR(training) 
datasets. Maunder used heliographic position, allowing for a sunspot group to 
remain unseen for some days and marking it as recurrent if another group was 
observed at the same heliographic position. The method employed here is to 
require a sunspot group to meet the "unreliable observed" criteria. Some of 
the groups marked as recurrent by Maunder failed to be seen within 30° of 
the relevant limb. It should also be noted that 1958 was a year of considerable 
activity on the Sun, whereas 1896 was relatively quiet, reducing the opportunity 
to observe recurrence. 

4. Gnevyshev Waldmeier Rule within Recurrent Groups 

After classifying recurrence, one might expect the GPR(linkcd, reliable) dataset 
to exhibit some of the well established physical characteristics of non-recurrent 
sunspot groups. The Gnevyshev-Waldmeier rule (Gnevyshev, 1938; Waldmeier, 
1955) states that sunspot group maximum area (Aq) and lifetime (T) are pro- 
portional: 

A = DgwT Dgw » lOMSHday- 1 . (1) 

where Dgw is the constant of proportionality measured in millionths of the solar 
hemisphere (MSH) per day. 

In their paper on sunspot decay, Petrovay and van Driel-Gesztelyi (1997) 
used the Debrecen recurrence dataset (Dezso, Gerlei, and Kovacs, 1987; Dezso, 
Gerlei, and Kovacs, 1997). They choose to use groups which "were born on 
the visible hemisphere and also died there", so that their lifetimes could be 
determined accurately. Since only one group satisfied this criterion, they applied 
the Gnevyshev-Ringnes correction to the remaining recurrent groups to make a 
total dataset of 128 groups. The Gnevyshev-Ringnes correction provides the 
probability that birth and death will occur in the visible hemisphere. After 
binning, they found a least squares linear fit of Dgw = 10.89 ±0.48 MSH day -1 . 

There are 841 groups in GPR(linked, reliable) that are reliably observed 
according to the criteria that both the birth and death of a group must take 
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Figure 5. Recurrent sunspot group lifetime (measured in days) plotted against maximum 
wholcspot size measured in millionths of the solar hemisphere (MSH). Lifetime is measured as 
the duration between first and last observation of a sunspot group. The GPR(linked, reliable) 
dataset contains 841 observations and a linear fit of Dqw = 11.73 ±0.26 is found. The data 
are divided into three age categories (grp 1, grp 2 and grp 3) and the centre of mass of each 
is indicated with a cross. The error bounds are marked by dashed lines and Dgw = 10 i s 
included for comparison. 

place within ±60° of the solar central meridian. Using reliably observed groups 
means the Gnevyshev-Ringnes correction is not required. Figure 5 shows a linear 
fit through GPR(linked, reliable) of D GW = 11.73 ±0.26. 

All of the lifetimes within GPR(linkcd. reliable) are accurate to within one 
day. The corresponding measurement of group maximum size is subject to some 
uncertainty because rotation of the Sun carries each recurrent group out of sight 
for a portion of its lifetime. The effect is that the value observed as the maximum 
is either the true maximum or a smaller value. 

The data show a large scatter around the linear fit (Figure 5). Regions between 
the bands of points (where no reliable observation is possible because either the 
birth or death of the group is unseen) make it somewhat difficult to determine 
the linear relationship. Three age categories are defined as follows: grp 1, ages 17 
to 45 days; grp 2, 46 to 72 days; grp 3, greater than 72 days. The centre of mass 
of each of these groups is indicated by a cross. In addition, since only recurrent 
sunspot groups arc examined, there arc no data points between the origin and 
«18 days. 

Compared to the data analysed by Petrovay and van Driel-Gesztelyi (1997), 
the GPR(linkcd, reliable) data are more numerous. In addition, the GPR(linked, 
reliable) data do not have a limit on the maximum size measured and only 
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recurrent groups are included in the fitting. For these reasons, one should not 
expect exact agreement between Dqw in Figure 5 and values found from pre- 
vious investigations. In this study, Dqw has a smaller uncertainty but is larger 
than previous estimates. The discrepancy between the value obtained in this 
paper and the one found by Petrovay and van Driel-Gesztelyi (1997) is small, 
if allowance is made for the appropriate error bounds, which suggests that for 
recurrent groups Dqw is probably closer to 11 MSH day -1 than 10 MSH day~ x . 



5. Temporal Variations of Sunspot Group Lifetime 

The GPR(linked) dataset presents a unique opportunity to investigate changes 
of sunspot lifetime with time. Previous studies of this property were complicated 
for the reasons already outlined; namely, incomplete recurrence data compiled 
by different individuals using different criteria. 

Blanter et al. (2006) considered the topic of sunspot lifetime. They per- 
formed a nonlinear study of the short-term correlation properties of solar activity 
in order to reveal their long-lifetime variations. Their method was applied to 
GPR(unrcliable) and allows for the problems associated with recurrence within 
the dataset and short-lived sunspots. These authors mitigate such factors by 
examining the population of sunspot lifetimes that are between 1 and 15 days. 
These observations were used to create a 22-year running averaged scries. 

One of the conclusions of the study by Blanter et al. (2006) was that they 
found sunspot lifetime of GPR(unreliable) had increased over the duration of 
the dataset. In addition, they were able to quantify the change in lifetime, which 
increased by a factor of 1.4 over the interval from 1915 to 1940. During this 
period, GPR(linkcd) indicates a change from below to above average lifetimes, 
as shown in Figure 6. 

Blanter et al. (2006) discuss the problems associated with both small sunspot 
groups, whose lifetime cannot be perfectly measured, and the lack of observations 
from the invisible side of the Sun. Because of this, they developed a technique 
that used sunspot group size and the Gncvyshcv Waldmcicr relationship to infer 
sunspot group lifetimes. Figure 6 presents GPR data (grey pattern region) and 
GPR(linked) (grey solid region) from our study. 

Large errors are observed in the calculation of lifetime from GPR data con- 
taining recurrent groups. GPR(linked) alone contains much reduced error and 
a trend can be observed. While GPR(linked) only contains groups that have a 
sufficiently long lifetime to be observed on more than one solar rotation, there 
are proportionally fewer such groups, which reduces the lifetime average during 
the sample window. 

GPR(linked) data shown in Figure 6 indicate a marked increase between 1910 
and 1950. Blanter et al. (2006) found that sunspot lifetime had increased by a 
factor of 1.4 between 1915 and 1940. The results presented here largely agree 
with that value. In addition, Figure 6 suggests that the increase may extend 
over a longer interval and also augments the work of Blanter et al. (2006) by 
introducing an uncertainty measure. 
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Figure 6. Lifetime of sunspot groups versus time, calculated for the longest-lived sunspot on 
a given day for 22-year (8035 days) moving averages. The GPR age estimate takes no account 
of recurrent sunspot groups and is plotted as a grey pattern contained within the bounds of 
error. GPR(linked) is plotted as a dark grey solid region containing the bounds of uncertainty. 
Error is calculated as the longest and shortest lifetime estimates of a sunspot group lifetime. 
Longest lifetime assumes that group birth was at the earliest possible unseen time and death 
was at the latest possible unseen time. Shortest lifetime is calculated assuming the earliest 
observation in GPR is the birth and the latest observation in GPR is the death. Variations in 
sunspot number are shown to indicate individual solar cycles. 



6. Conclusions 

Neural networks have been applied previously to various problems in space 
science. In particular, they have been used in the prediction of geomagnetic 
phenomena (Lundstedt, 1992; Lundstedt and Wintoft, 1994; Calvo, Ceccato, 
and Piaccntini, 1995; Gleisner, Lundstedt, and Wintoft, 1996; Williscroft and 
Poole, 1996; Wu and Lundstedt, 1996; Weigel et at, 1999), the classification of 
asteroid spectra (Howell, Mcrenyi, and Lebofsky, 1994), and ionogram processing 
(Galkin et al, 1996). In addition, they have also been applied to the important 
problem of automatically classifying sunspots from data obtained by processing 
SOHO/MDI satellite images (Nguyen et at, 2006). However, the authors are 
not aware of the previous use of neural networks in studies of recurrent sunspot 
groups. 

It has been shown in this study that it is possible to train a neural network 
to identify recurrent sunspot groups within the Greenwich Photohcliographic 
Results (GPR), which extend over the long interval 1874 - 1976 (Section 2). Once 
trained, the neural network can often outperform a human classifier, particularly 
when a large number of sunspot groups are present on the solar disc at the same 
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time. Since the neural network performs deterministically when classifying a 
pair of linked sunspot groups, it operates with a consistency of judgement that 
exceeds the sterling endeavours of various individuals over more than a century. 

Nevertheless, the neural network method has some limitations. These are 
most clearly revealed when considering false-positive linkages (Section 3). In 
such cases, the black-box nature of the decision process becomes problematic 
and confidence in the neural network is partially undermined. In the method 
discussed in this paper, post-processing of the data is used to reduce false-positive 
linkages but in any future study it might be valuable to investigate alternative 
machine learning systems. 

Despite these limitations, the particular choice of network and data reduc- 
tion procedure employed in this study could also be applied to Rome Daily 
Sunspot Reports (1958-2000), USSR Station Data (1968-1991), Mount Wil- 
son Individual Sunspot Data (1917- 1985), Kodaikanal Individual Sunspot Data 
(1906-1987) and Greenwich/Debrecen Observations (1874-2007), without re- 
training the existing neural network. All of these datasets arc available through 
either the National Geophysical Data Centre (http://www.ngdc.noaa.gov) or the 
UK Solar System Data Centre (http://www.ukssdc.ac.uk). 

The constants of proportionality in the Gnevyshev-Waldmeier rule (Aq = 
Dq\yT) derived in this study (Section 4) have been found to be greater than 
previous estimates. Petrovay and van Driel-Gesztelyi (1997), who used informa- 
tion on recurrent sunspot groups extracted from the Debrecen Photohcliographic 
Results, also found a value of Dqw that was larger (10.89 MSH day -1 ) than 
previous estimates. In a study of sunspot group lifetimes, Zuccarello (1993) pre- 
sented results which show a change in the rotation rate between short-lived and 
long-lived (11-day old) sunspot groups, pointing to a difference in "aggregation 
capability" of the group within the convection zone. The present results suggest 
that there may be some additional physics present within longer lived groups in 
the GPR(linked, reliable) dataset and thereby imply that this matter warrants 
further investigation. 

Evidence has been found for an increase in the lifetime of recurrent sunspot 
groups by a factor of about 1.4 between 1915 and 1940 (Section 5), which is in 
excellent agreement with the result obtained by Blanter et al. (2006). Indeed, 
this increase in lifetime actually occurs over a longer period (1915 - 1950) than 
previously thought and there is also provisional evidence for a slight decrease in 
lifetime between 1950 and 1965 (see Figure 6). 

Solar changes over periods longer than a few decades are currently of consid- 
erable interest as the solar output is a significant input to climate models (Haigh 
et al., 2005). The Gleissberg cycle is detected in sunspot number and has a mea- 
sured period of approximately 80-120 years (Gleissberg, 1967; Hoyt, Schatten, 
and Nesmes-Ribes, 1994; Garcia and Mouradian, 1998). Garcia and Mouradian 
(1998) estimated the most recent Gleissberg cycle minima to be around 1900 
and the maxima around 1965. The results presented here suggest that, if the 
lifetimes of recurrent sunspot groups are a good proxy for the Gleissberg cycle, 
the maxima occurred some years before 1965 during the 1950s. Further study of 
this topic would be facilitated by applying the trained neural network to sunspot 
data in the interval 1977-2009. 
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The analysis of historical sunspot observations using automated methods, 
such as the neural network method presented in this paper, is required to esti- 
mate the (true) total number of sunspots emerging on the solar surface during the 
solar cycle. Numerical simulations of the centennial evolution of magnetic flux on 
the solar surface scale the sunspot emergence rate to the total sunspot number, 
measured without considering recurrent sunspots and their variable lifetimes in 
great detail (Wang, Lean, and Sheeley, 2002). A sunspot number that corrects 
for the recurrence of long-lived sunspots may improve the numerical predictions 
by providing a better description of emergence. The numerical simulations over 
centennial scales could be compared to the most recent estimates of long-term 
variation of the open magnetic flux on the solar surface (Rouillard, Lockwood, 
and Finch, 2007). Neural networks could also be optimised to detect the location 
of the umbra and penumbra of sunspots and to estimate the total magnetic flux 
inside these large-scale active regions. Such a rough estimate of the magnetic 
flux on the photosphere could then be used to estimate the magnetic topology 
of the corona using simplified potential field source surface models. 
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Abstract 

Long-lived (>20 days) sunspot groups extracted from the Greenwich Photo- 
heliographic Results (GPR) are examined for evidence of decadal change. The 
problem of identifying sunspot groups which are observed on consecutive solar 
rotations (recurrent sunspot groups) is tackled by first constructing manually an 
example dataset of recurrent sunspot groups and then using machine learning 
to generalise this subset to the whole GPR. The resulting dataset of recurrent 
sunspot groups is verified against previous work by A. Maunder and other Royal 
Greenwich Observatory (RGO) compilers. Recurrent groups are found to exhibit 
a slightly larger value for the Gnevyshev-Waldmeier Relationship than the value 
found by Pctrovay and van Driel-Gesztelyi (Solar Phys. 51, 25, 1997), who 
used recurrence data from the Debrecen Photoheliographic Results. Evidence 
for sunspot group lifetime change over the previous century is observed within 
recurrent groups. A lifetime increase of 1.4 between 1915 and 1940 is found, 
which closely agrees with results from Blanter et al. (Solar Phys. 237, 329, 
2006). Furthermore, this increase is found to exist over a longer period (1915 to 
1950) than previously thought and provisional evidence is found for a decline 
between 1950 and 1965. Possible applications of machine-learning procedures to 
the analysis of historical sunspot observations, the determination of the magnetic 
topology of the solar corona and the incidence of severe space-weather events 
are outlined briefly 

Keywords: Sunspots, neural networks, long-term change, non-linear, lifetime, 
Greenwich, sunspot nests, sunspot nestlet, 
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1. Introduction 

The influence of the Sun on the Earth has attracted renewed attention in the 
context of climate change (Friis-Christensen and Svensmark, 1997). Sunspots 
are a good measure of solar activity and have been observed systematically for 
hundreds of years. The Royal Greenwich Observatory (RGO) began an effort 
to record the position and size of sunspot groups in 1874 and maintained this 
programme of solar observations until the end of 1976. The resulting dataset is 
unrivalled in its longevity and homogeneity. 

Several attempts to model the total quantity of solar radiation arriving at the 
Earth (the total solar irradiance) have been undertaken using various indices 
(Lean, Beer, and Bradley, 1995; Fligge and Solanki, 1997; Balmaceda, Krivova, 
and Solanki, 2007). Some of these attempts have relied on the measurements 
of sunspot number, since this index extends back for a few centuries. Modern 
high resolution imaging and measurement of sunspot properties are of limited 
use because the characteristic times of solar change, on top of the 22-year solar 
cycle, are expected to take place on centennial time scales (Blanter et at, 2006). 

Studies of the temporal properties of sunspot groups (lifetime, maximum 
size, heliographic position, etc) are hampered by two factors: short-lived sunspot 
groups may be missed due to nightfall (Solanki, 2003) and the rotation of the 
Sun carries groups out of view from an Earth-bound observer. In addition, the 
effects of fore-shortening and limb darkening will hamper reliable observation 
away from the central meridian (Pierce and Slaughter, 1977). 

In order to quantify the limits of reliable observation of sunspot groups within 
the GPR, the longitude distribution of the apparent maximum size is presented 
in Figure 1. From this illustration one can conservatively conclude that observa- 
tions at distances greater than 60° from the central meridian are difficult. This 
result is consistent with theoretical findings (Kopecky, 1985). 

Sunspot groups that are recorded at any stage with a central meridian distance 
<-60° or >+60° are classified as 'unreliably observed'. This subset is named 
GPR(unreliable) herein. 

The considerable asymmetry of the distribution in Figure 1 may be explained 
by a combination of sunspot group decay rate and recurrence. This matter is 
briefly treated by Henwood (2008). However, a detailed investigation is outside 
the intended scope of this paper. 

Blanter et al. (2006), performed a non-linear study of short-term correla- 
tion properties of solar activity in order to reveal long- lifetime variations. This 
method was applied to the GPR and an increase of lifetime by a factor of 1.4 
was observed from 1915 to 1940. A dataset of sunspot group lifetimes that are 
not truncated by solar rotation would allow direct measurement of lifetime and 
hence verify the observation of Blanter et al.. 

In this study a training dataset of recurrent sunspot groups is constructed by 
hand from longitude-time plots of GPR data. This subset of GPR data is bal- 
anced to ensure an equal number of linked and unlinked examples, and presented 
as training input to a number of feed-forward neural network architectures. Each 
network is trained using 10-fold cross validation and the network that displays 
the lowest over or under fitting is selected. The chosen network is then trained 
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Figure 1. The longitudinal distance from the central meridian of the Sun at which a sunspot 
group achieves maximum wholcspot size. Error limits are the width of the bins (5°) in the 
x-direction and the square-root of the number of elements in the y-dircction. GPR data are 
filtered to remove sunspot groups with only one observation, since these do not make mean- 
ingful maximum-size contributions. The resulting GPR data contain a maximum at around 
-75°. This is attributed to long-lived spots which are growing as they reach the west limb 
of the Sun or declining as they appear over the east limb of the Sun. The \ 2 value for this 
distribution gives a high statistical significance (>99%) to the departure from uniformity. The 
declining measurements within the grey regions indicate that reliable observations towards the 
solar limbs arc difficult. 

and applied to the entire GPR dataset. Finally, a simple filter is applied to the 
linked dataset and validation is performed. 

2. Method 

A meticulous observer can follow a long-lived sunspot group for successive days 
until it disappears from view over the west limb of the Sun. By taking note of 
the latitude of the last reliable observation, and with knowledge of the solar 
rotation period, the observer could wait to see if a sunspot group appeared over 
the east limb at roughly the same latitude and at the appropriate time. If such 
a prediction is fulfilled, one may conclude that the same sunspot group has been 
observed on consecutive rotations and that it should not be recorded as two 
separate sunspot groups. Ideally, a recurrent sunspot group should be identified 
as such, possibly by using the same sunspot group number for such recurrent 
observations. Within the digital version of the GPR, no recurrent information 
is recorded. Sunspot observations are grouped together and allocated a unique 
Greenwich group number if the group is observed on consecutive days. 
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Figure 2. A longitude-time diagram of GPR(unreliable) groups for the last six months of 
1935. Time is in the x-direction, longitude in the {/-direction. GPR(unreliable) groups are 
displayed as a connected series of coloured circles. Colour corresponds to latitude (right-hand 
colour bar); radius indicates size, corrected for foreshortening and measured in millionths of 
the solar hemisphere. Example radii are provided for sunspot sizes of 1000, 2000, 3000 and 4000 
millionths of the solar hemisphere (left of colour bar). Slanting grey dots mark the Carrington 
longitude of the east and west limb of the Sun. 

Becker (1955) studied Sonnenfleckenherde, which may be translated as 'focus 
of sunspots'. The method employed to identify recurrence in that study was a 
"statistical method" . Sonnenfleckenherde were defined as "an area on the Sun 
in which, during a longer period of time, spots appear or are being built" . Size, 
heliographic latitude and longitude are all taken into consideration during the 
search for recurrent groups. This method was applied to GPR data from 1879 
to 1941 and 46 Sonnenfleckenherde were catalogued. 

Subsequently, Castcnmillcr, Zwaan, and van der Zalm (1986) introduced the 
term sunspot nest to describe the same phenomenon on the Sun. They de- 
fined nests as evolving within one month, lasting for 6 to 15 rotations and 
not expanding in latitude or longitude. In their study, the primary means of 
investigating sunspot nests was by visualising the GPR in heliographic longitude- 
time diagrams (Figure 2) and recording recurrent spots manually. Practically, 
Castenmiller, Zwaan, and van der Zalm (1986) required the following data for 
their analysis: Carrington longitude, latitude and number of "visible days" a 
group was observed during one solar rotation. This method was applied to the 
period August 1959 - December 1964 of GPR data, and 41 probable sunspot 
nests were found. 

Within this paper a less stringent concept of a sunspot nest is employed, 
namely a sunspot nestlet. A nestlet is defined as two or more 'unreliably' observed 
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sunspot groups which are linked together because they are likely to be the same 
group but have different group numbers in the GPR. Hence a nestlet physically 
corresponds to a white-light sunspot group observed on consecutive solar rota- 
tions. It is possible Nestlets are found preferentially in "active longitudes" or 
"hot spots" zones, but such a study is outside the intended scope of this paper. 
The method employed to hnd all the nestlets within the GPR is as follows: 

1. Construct a longitude-time diagram for a chosen solar cycle. 

2. By inspection, create a list of recurrent sunspot groups. 

3. Use this list to train a neural network. 

4. Apply the trained neural network to the whole GPR dataset. 

5. Post-process to remove outliers. 

2.1. Solar Differential Rotation 

Sunspot groups move on the photosphere with different speeds depending on 
latitude because of differential solar rotation (Schroter, 1985; Thompson et al, 
1996). According to the annals published by the Royal Greenwich Observatory, 
"it should be noted that longitudes are based on the ephemeris given in the 
Astronomical Ephemeris, assuming a solar rotation period constant at all lat- 
itudes" (Royal Greenwich Observatory, 1980). Hence, after an interval of one 
rotation, recurring groups would be expected to show a drift in longitude due to 
differential solar rotation. Using the analysis of solar rotation from GPR data, 
performed by Balthasar, Vazquez, and Wohl (1986), one should expect a group 
at an extreme latitude of 35° to exhibit a backwards drift of at most l°day~ 1 
with respect to the equator. This would correspond to a drift of not more than 
18° during the whole unreliable passage (18 days). The figure of 18 days is 
derived using a synodic solar rotation period of 27 days and an unseen passage 
of 180° (the far-side of the Sun) + the unreliable regions of the near side of the 
Sun (2 x 30°). This gives: § x 27 days = 18 days. 

Figure 4 demonstrates that sunspot groups rarely move more than 5° in lati- 
tude and 15° in longitude over the duration of their observed life. Such movement 
appears to be independent of solar cycle development. Hence the worst case of 
18° longitude drift due to differential rotation is uncommon. 

2.2. Presenting GPR Data to the Neural Network 

The purpose of the proceeding approximate analysis of sunspot movement is 
to provide boundary conditions when converting GPR(unreliable) data into a 
form suitable for the neural network. This process consists of two stages. First, 
selecting groups that are possibly linked. Second, reducing each possible linkage 
into a suitable network input. 

For the first stage, the group movement metrics calculated above provide 
approximate boundary conditions when considering if any two sunspot groups 
could be linked. It is asserted here that for such a link to exist from an initial 
group, the linked group (post group) must appear within latitude and longitude 
bounds of ±15° and ±50° respectively. These bounds are at the extreme end of 
observed group movement against an average solar rotation (Section 2.1). 
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In addition, a post group must appear in the future (with respect to the 
initial group) and within a time window that starts when the first longitude 
bound appears at the east limb of the Sun, and ends when the last longitude 
bound appears at the east limb of the Sun. These bounds are calculated using 
an ephemeris (Meeus, 1991). A deliberately generous window is chosen because 
decision making regarding recurrence is performed by the neural network, not 
during pre-processing of the data. 

Every group which is ever observed near the unreliable west limb of the Sun 
has this first stage applied to it and a set of potential linked candidates are 
created. Once a pair of potential linked candidates have been identified, they 
are encoded for presentation to the neural network. In the first step, the two 
groups which make up the linking candidates are classified as an "initial group" 
and a "post group" . The "initial group" is the one that is observed first in time 
and disappears over the west limb of the Sun. The "post group" is the one that 
is observed later in time, and emerges over the east limb of the Sun. 

The encoding of initial and post groups takes place as follows. The observation 
closest to the west limb is found for the initial group. For each observation 
in time, sunspot groups are quantised into a given number of bins for both 
latitude and longitude. Longitude and latitude are encoded separately. For the 
initial group, five latitude bins and seven longitude bins are used. The first 
observation of the group is placed in the middle bin (1) with the other bins 
at this time step set to zero (Figure 3, line 11 for longitude and line 27 for 
latitude). Tracking back from the west limb of the Sun, time steps are assumed 
to be 180/15 °/day (Figure 3, lines 11-17 and lines 27-33). This time step 
corresponds to the typical observed movement of a group due to solar rotation. 
If no group is observed during a particular time step, all the bins are set to zero 
(0). Subsequent group observations are placed in bins relative to the previous 
observation of the group. Bin widths are 1/2° and 1/7° for longitude and latitude 
respectively. This process is repeated for seven time steps and is illustrated for 
group numbers 8929 and 8965 in Figure 3. Uncoded groups are visualised on the 
left, with the corresponding network encoding on the right. 

The process for treatment of the post group is similar. For the post group, 
9 latitude bins and 21 longitude bins are used (Figure 3, lines 19-25 and lines 
3-9). Again, seven observations in time are made. In the case of the post group, 
however, all binning is performed relative to the last observed longitude of the 
initial group. Extreme deviations are limited at the bin furthest from the middle 
bin. 

2.3. Constructing Training Data 

In this study, the judgment of the first author (Henwood, 2008) was initially used 
to decide if any two sunspot groups were close enough to represent a recurrent 
group. The dataset was balanced and a feed-forward neural network system was 
then trained with these judgments. In this context, a balanced dataset is one 
with an equal number of positive and negative examples. The system generalises 
from the examples presented to it and captures the uncertainty of the task. 
When a previously unseen case of possible recurrence is introduced, the trained 
neural network provides a probability that this is indeed a case of recurrence. 
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1 # Input pattern 83 8929 -> 8965 
2# post link group, longitude 

3 000000000000000001000 

4 000000000000000001000 

5 000000000000000001000 

6 000000000000000001000 

7 000000000000000001000 

8 000000000000000010000 

9 000000000000000010000 

io # initial group, longitude 
n 0001000 

12 0001000 

13 0000100 

14 0000010 

15 0000010 

16 0000000 

17 0000000 

18 # post link group, latitude 

19 001000000 

20 001000000 
21 001000000 

22 001000000 

23 00 1 00 00 

24 00 100 00 

25 000100000 

26 # initial group, latitude 

27 00100 

28 00100 

29 00100 

30 00100 

31 00100 

32 00 

33 00 

34 # initial and post group number: 

35 8 9 2 9 8 9 65 

36 # Output pattern 83 

37 



Figure 3. Schematic example of encoding a link between two groups a [initial) and b (post). 
"Zeros" and "ones" signify the absence and presence (respectively) of a sunspot group in a 
longitude or latitude bin. The moderate variation in longitude is visible in the encoded data 
(lines 11-17). The post link group is encoded with observations nearest the limb ranked 
highest. Thus the ninth line of the data representation encodes the longitude of group b 
observed closest to the east limb. The line of "ones" to the right of centre indicates that 
this group appears at a greater longitude than the initial group. Similarly, latitude indicated 
by colour is encoded in lines 27-33 for group a and lines 19-25 for group b. Line 37 encodes the 
judgement on whether or not the two groups shown are the same recurrent group; accordingly, 
'zero' is false, 'one' is true. Line 35 includes the Greenwich group numbers of the two groups. 
This is not used in decision making, it is used to identify groups after the neural network 
decision making process is complete. 



Solar cycle 15 (August 1913 - August 1923) has been selected as the period 
to identify recurrence manually, since this cycle is neither the longest nor the 
shortest. It can also be described as a period when the overall sunspot group 
number is neither particularly large nor small. In short, this cycle is chosen 
because it is "typical". 

Only sunspot groups in the GPR(unreliable) dataset are considered for recur- 
rence. Pairs of groups which appear to be linked by recurrence are selected for the 
entire ten years of GPR(training). The training data, GPR(training), consists of 
4073 examples, 621 of which are "linked" or probability = 1, the remaining are 
"unlinked" or probability = 0. Such a dataset, which contains unequal numbers 
of true and false examples, is called "unbalanced" . Provost (2000) highlights the 
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Figure 4. Extreme deviation is measured for all sunspot groups within the GPR that are 
observed to exist for more than ten days. The extreme deviation is measured as the difference 
between the absolute maximum and minimum recorded heliographic positions ("maximum 
deviation", x-direction). Deviation counts are binned into solar cycle bins. Each solar cycle 
is divided into nine bins ("solar cycle bin", y-direction) . Deviation counts are normalised 
("normalised occurrence", z-direction). The extreme latitude deviation (top) illustrates that 
groups commonly move a few degrees but rarely more than five degrees. Extreme longitude 
(bottom) deviations indicate a greater spread but with groups rarely moving more than 15°. 



problems associated with using unbalanced data with standard machine learning 
algorithms. Fawcett (2004) suggests the use of receiver operating characteristic 
(ROC) curves to evaluate a classifier trained on unbalanced data, while Lawrence 
et al. (1998) provide a number of suggestions in order to balance unbalanced 
data. 

In this study, the training dataset is balanced using two techniques: firstly, 
the addition of random noise (switching a single random adjacent bit in the 
network representation) and secondly a technique based on a priori knowledge 
of the problem domain. This second technique uses the following assumption: 
If one accepts that a given pair of groups are linked, any pair which have a 
smaller unseen latitude and/or longitude deviation must also be linked. Hence, 
new linked groups can be created from existing groups by re-encoding a linked 
pair of groups with a reduced sensitivity than described in Section 2.2. Using 
these two techniques, the training data is balanced to produce 3756 positive 
examples and 3827 negative examples. 

Ten-fold cross validation (Kohavi, 1995) has been used during training to pro- 
vide the learning and error rates. These are measured to identify network designs 
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that exhibit overfitting or undcrfitting (Moore, 2001). Ovcrfitting is detected 
by observing that the error rate does not increase as learning is taking place. 
Under-fitting is detected by observing that learning has approached a stable 
minimum. Six network designs were constructed with varying numbers of hidden 
layers and interconnects. These designs are documented in the Master's Thesis of 
Hcnwood (2008) , which also provides an assessment of the performance of each 
network. The trained network that exhibits the least overfitting or undcrfitting 
is presented with the GPR(unreliable) dataset and, after classification, returns 
a new dataset with recurrent sunspots grouped and labelled with the group 
number of the first observed sunspot group. This dataset is called GPR(linked') 
and contains 5374 group linkages. 

A comparison of GPR(training) with GPR(linked') was then performed by 
hand. This revealed 20 or so linked groups identified by the neural network that 
exhibited a large deviation of latitude or longitude during unseen passage. We 
judged these to be physically unlikely but since there is no absolute criterion 
for this classification these could in principle be identified as linked groups by 
another observer. We have chosen a criterion with which to filter the data. 

A filter was constructed which removed linked groups that exhibited a large 
latitude deviation (> 8.5°) or the post link was not observed near to the East 
limb, which corresponds to an large (> 19.5°) longitude deviation. This filter 
removed 17 linkages (3%) from GPR(training) and 244 (5%) linkages from 
GPR(linked'). This suggests that the majority of the "physically unlikely" groups 
were not a result of overfitting or undcrfitting but are the result of subjectivity 
during selection of linked groups for the training dataset GPR(training). 

A final dataset, GPR(linked) was produced by applying the filter described 
above to GPR(linkcd'). The result is a dataset with contentious linkages re- 
moved. This dataset is used throughout the remainder of this study and is 
available at the UK Solar System Data Centre (http://www.ukssdc.ac.uk/wdccl/ 
greenwich/recurrence) 

3. Comparison with Manual Datasets 

The UK Solar System Data Centre (http://www.ukssdc.ac.uk) maintains a com- 
plete set of annals that were published by the Royal Greenwich Observatory; 
these provided the source for the GPR. As an appendix to the Greenwich ob- 
servations, a "Catalogue of Recurrent Groups of Sun Spots, 1874 - 1906" was 
compiled by A.S.D. Maunder and published in 1909. Between 1916 and 1955, 
the "Ledgers of Groups of Sunspots" included two sections: "Recurrent" and 
"Non- Recurrent" . Recurrent groups were also identified between 1907 and 1915 
but tabulation of the information during that time was different. 

While a digital version of these records is not known to exist, the method used 
to compile recurrent spots is documented (in the Greenwich Photoheliographic 
Results, 1956): 

Recurrent groups were selected upon the following plan, reference being made 
to the General Catalogue:- If any spot when first seen was 60° or more to the 
east of the central meridian, the catalogue and, if necessary, the Daily Results 
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Table 1. Recurrent sunspot groups found by Henwood (2008), termed GPR(training), 
and two other recurrent group datascts tabulated within the Royal Greenwich Observatory 
publications are compared with GPR(linked). The columns of values, from left to right 
are: the number of linked groups which the human and neural network classifier agree on 
(True Positive count); the number of linkages which both the human and neural network 
agree are not linked (True Negative count); the number of links which the neural network 
classifies as true but the human does not (False Positive count); the number of linkages the 
human thinks are true but the neural network does not (False Negative count). 



Datasct 


True Positives 


True Negatives 


False Positives 


False Negatives 


GPR(training) 


450 


3799 


34 


170 


RGO 1958 


50 


2032 


106 


11 


RGO 1896 


15 


180 


13 


11 



also, were searched some fifteen to sixteen days earlier to ascertain whether 
a spot group of similar heliographic longitude and latitude was then near the 
west limb of the Sun. Similarly, if any spot group when last seen was 60° or 
more to the west of the central meridian, a search was made fifteen to sixteen 
days later. When there appeared to be a case of probable continuity between 
groups in consecutive rotations of the Sun, the character of the groups, their 
areas and their longitude and latitude have been carefully compared before 
accepting them as recurrent groups. 

Between 1874 and 1906, Maunder catalogued 624 recurring groups: 468 were 
seen only in two rotations, 118 appeared in three rotations, 25 in four rotations, 
12 in five rotations, and 1 (somewhat doubtfully) in six rotations. 

The process of verification addresses the question of whether or not the work 
described in this paper has been completed correctly. One approach is to compare 
GPR(linked) with the recurrence observations that were published by the Royal 
Greenwich Observatory. Recurrence data for 1896 and 1958 were typed in and 
compared with GPR(linkcd). For interest, the performance of the neural network 
against the training data, GPR(training), was also evaluated. 

Table 1 compares these three different linked sunspot group datasets with 
GPR(linked) using metrics for a confusion matrix (Kohavi and Provost, 1998). 
True positive counts are links which are in both the manual and GPR(linked) 
datasets. False positives are links which are only in GPR(linkcd). False negatives 
are links which are in the manual dataset but not in GPR(linked). True negatives 
are defined as: TN = Total -TP- FP- FN. 

By rapidly constructing longitude-time diagrams that correspond to the pe- 
riod of interest the instances of false classification have been investigated individ- 
ually. As may be expected, GPR(training) data performs the best overall. It does 
not achieve 100% success compared to the human classification. On inspection 
of the "False Positive" classifications made by the neural network, all but one of 
these linkages were discovered to be physically likely linkages which the human 
classifier had overlooked. During inspection by longitude-time diagrams it was 
observed that these "False Positive" results occurred in dense complexes of many 
groups. The neural network out-performs the human under these conditions. 



S0LA_1023.tex; 24/07/2009; 12:36; p. 10 



Lifetime of Recurrent Sunspot Groups 



The Royal Greenwich Observatory 1958 recurrence records show the highest 
number of false positives. On investigating the set of examples classified as false 
positives, it was observed that these were again a result of a complication arising 
during observation of longitude-time diagrams. It should be noted that 1958 
was a year during which the sunspot number reached a record high. This level 
of activity corresponds to a greater number of observations to display on a 
longitude-time diagram, which increases the challenge to a human of identifying 
all potentially linked groups. 

In addition, the method used during 1958 was restricted to providing only 
one link between two given groups. However, the neural network classifier could 
find multiple linkages between groups. Finally, the neural network classifier could 
also make a link between groups even if one of the groups was only visible for a 
single day. Such a linkage is apparently not permitted within the 1958 recurrence 
datasct. 

The RGO 1896 recurrence records showed the greatest discrepancy from the 
neural network linked dataset. This is probably because the criteria used to iden- 
tify a recurrent group are different from both the 1958 and the GPR(training) 
datasets. Maunder used heliographic position, allowing for a sunspot group to 
remain unseen for some days and marking it as recurrent if another group was 
observed at the same heliographic position. The method employed here is to 
require a sunspot group to meet the "unreliable observed" criteria. Some of 
the groups marked as recurrent by Maunder failed to be seen within 30° of 
the relevant limb. It should also be noted that 1958 was a year of considerable 
activity on the Sun, whereas 1896 was relatively quiet, reducing the opportunity 
to observe recurrence. 



4. Gnevyshev Waldmeier Rule within Recurrent Groups 

After classifying recurrence, one might expect the GPR(linked, reliable) dataset 
to exhibit some of the well established physical characteristics of non-recurrent 
sunspot groups. The Gnevyshev- Waldmeier rule (Gnevyshev, 1938; Waldmeier, 
1955) states that sunspot group maximum area (A ) and lifetime (T) are pro- 
portional: 

A = D GW T D GW « lOMSHday" 1 . (1) 

where Dqw is the constant of proportionality measured in millionths of the solar 
hemisphere (MSH) per day. 

In their paper on sunspot decay, Petrovay and van Driel-Gesztelyi (1997) 
used the Debrecen recurrence dataset (Dezsd, Gerlei, and Kovacs, 1987; Dezso, 
Gerlei, and Kovacs, 1997). They choose to use groups which "were born on 
the visible hemisphere and also died there", so that their lifetimes could be 
determined accurately. Since only one group satisfied this criterion, they applied 
the Gnevyshev-Ringnes correction to the remaining recurrent groups to make a 
total dataset of 128 groups. The Gnevyshev-Ringnes correction provides the 
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Lifetime, days 

Figure 5. Recurrent sunspot group lifetime (measured in days) plotted against maximum 
wholespot size measured in millionths of the solar hemisphere (MSH). Lifetime is measured as 
the duration between first and last observation of a sunspot group. The GPR(linked, reliable) 
dataset contains 841 observations and a linear fit of Dgw = 11-73 ±0.26 is found. The data 
are divided into three age categories (grp 1, grp 2 and grp 3) and the centre of mass of each 
is indicated with a cross. The error bounds are marked by dashed lines and Dqw = 10 is 
included for comparison. 

probability that birth and death will occur in the visible hemisphere. After 
binning, they found a least squares linear fit of L>gw — 10.89 ±0.48 MSH day" 1 . 

There are 841 groups in GPR(linked, reliable) that are reliably observed 
according to the criteria that both the birth and death of a group must take 
place within ±60° of the solar central meridian. Using reliably observed groups 
means the Gnevyshev-Ringnes correction is not required. Figure 5 shows a linear 
fit through GPR(linkcd, reliable) of D GW = 11.73 ±0.26. 

All of the lifetimes within GPR(linkcd, reliable) are accurate to within one 
day. The corresponding measurement of group maximum size is subject to some 
uncertainty because rotation of the Sun carries each recurrent group out of sight 
for a portion of its lifetime. The effect is that the value observed as the maximum 
is either the true maximum or a smaller value. 

The data show a large scatter around the linear fit (Figure 5). Regions between 
the bands of points (where no reliable observation is possible because either the 
birth or death of the group is unseen) make it somewhat difficult to determine 
the linear relationship. Three age categories are defined as follows: grp 1, ages 17 
to 45 days; grp 2, 46 to 72 days; grp 3, greater than 72 days. The centre of mass 
of each of these groups is indicated by a cross. In addition, since only recurrent 
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sunspot groups are examined, there are no data points between the origin and 
wl8 days. 

Compared to the data analysed by Petrovay and van Driel-Gesztelyi (1997), 
the GPR(linked, reliable) data are more numerous. In addition, the GPR(linkcd, 
reliable) data do not have a limit on the maximum size measured and only 
recurrent groups are included in the fitting. For these reasons, one should not 
expect exact agreement between Dqw m Figure 5 and values found from pre- 
vious investigations. In this study, Dqw has a smaller uncertainty but is larger 
than previous estimates. The discrepancy between the value obtained in this 
paper and the one found by Petrovay and van Driel-Gesztelyi (1997) is small, 
if allowance is made for the appropriate error bounds, which suggests that for 
recurrent groups Dqw is probably closer to 11 MSH day -1 than 10 MSH day -1 . 

5. Temporal Variations of Sunspot Group Lifetime 

The GPR(linkcd) dataset presents a unique opportunity to investigate changes 
of sunspot lifetime with time. Previous studies of this property were complicated 
for the reasons already outlined; namely, incomplete recurrence data compiled 
by different individuals using different criteria. 

Blantcr et al. (2006) considered the topic of sunspot lifetime. They per- 
formed a nonlinear study of the short-term correlation properties of solar activity 
in order to reveal their long-lifetime variations. Their method was applied to 
GPR(unreliable) and allows for the problems associated with recurrence within 
the dataset and short-lived sunspots. These authors mitigate such factors by 
examining the population of sunspot lifetimes that are between 1 and 15 days. 
These observations were used to create a 22-year running averaged series. 

One of the conclusions of the study by Blanter et al. (2006) was that they 
found sunspot lifetime of GPR(unrcliable) had increased over the duration of 
the dataset. In addition, they were able to quantify the change in lifetime, which 
increased by a factor of 1.4 over the interval from 1915 to 1940. During this 
period, GPR(linkcd) indicates a change from below to above average lifetimes, 
as shown in Figure 6. 

Blantcr et al. (2006) discuss the problems associated with both small sunspot 
groups, whose lifetime cannot be perfectly measured, and the lack of observations 
from the invisible side of the Sun. Because of this, they developed a technique 
that used sunspot group size and the Gnevyshev-Waldmeier relationship to infer 
sunspot group lifetimes. Figure 6 presents GPR data (grey pattern region) and 
GPR(linked) (grey solid region) from our study. 

Large errors are observed in the calculation of lifetime from GPR data con- 
taining recurrent groups. GPR(linkcd) alone contains much reduced error and 
a trend can be observed. While GPR(linked) only contains groups that have a 
sufficiently long lifetime to be observed on more than one solar rotation, there 
are proportionally fewer such groups, which reduces the lifetime average during 
the sample window. 

GPR(linked) data shown in Figure 6 indicate a marked increase between 1910 
and 1950. Blanter et al. (2006) found that sunspot lifetime had increased by a 
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Figure 6. Lifetime of sunspot groups versus time, calculated for the longest-lived sunspot on 
a given day for 22-year (8035 days) moving averages. The GPR age estimate takes no account 
of recurrent sunspot groups and is plotted as a grey pattern contained within the bounds of 
error. GPR(linked) is plotted as a dark grey solid region containing the bounds of uncertainty. 
Error is calculated as the longest and shortest lifetime estimates of a sunspot group lifetime. 
Longest lifetime assumes that group birth was at the earliest possible unseen time and death 
was at the latest possible unseen time. Shortest lifetime is calculated assuming the earliest 
observation in GPR is the birth and the latest observation in GPR is the death. Variations in 
sunspot number arc shown to indicate individual solar cycles. 



factor of 1.4 between 1915 and 1940. The results presented here largely agree 
with that value. In addition, Figure 6 suggests that the increase may extend 
over a longer interval and also augments the work of Blantcr et al. (2006) by 
introducing an uncertainty measure. 



6. Conclusions 



Neural networks have been applied previously to various problems in space 
science. In particular, they have been used in the prediction of geomagnetic 
phenomena (Lundstcdt, 1992; Lundstedt and Wintoft, 1994; Calvo, Ceccato, 
and Piacentini, 1995; Gleisner, Lundstedt, and Wintoft, 1996; Williscroft and 
Poole, 1996; Wu and Lundstedt, 1996; Weigel et al., 1999), the classification of 
asteroid spectra (Howell, Merenyi, and Lcbofsky, 1994), and ionogram processing 
(Galkin et al., 1996). In addition, they have also been applied to the important 
problem of automatically classifying sunspots from data obtained by processing 
SOHO/MDI satellite images (Nguyen et al, 2006). However, the authors are 
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not aware of the previous use of neural networks in studies of recurrent sunspot 
groups. 

It has been shown in this study that it is possible to train a neural network 
to identify recurrent sunspot groups within the Greenwich Photohcliographic 
Results (GPR), which extend over the long interval 1874 - 1976 (Section 2). Once 
trained, the neural network can often outperform a human classifier, particularly 
when a large number of sunspot groups are present on the solar disc at the same 
time. Since the neural network performs deterministically when classifying a 
pair of linked sunspot groups, it operates with a consistency of judgement that 
exceeds the sterling endeavours of various individuals over more than a century. 

Nevertheless, the neural network method has some limitations. These are 
most clearly revealed when considering false-positive linkages (Section 3). In 
such cases, the black-box nature of the decision process becomes problematic 
and confidence in the neural network is partially undermined. In the method 
discussed in this paper, post-processing of the data is used to reduce false-positive 
linkages but in any future study it might be valuable to investigate alternative 
machine learning systems. 

Despite these limitations, the particular choice of network and data reduc- 
tion procedure employed in this study could also be applied to Rome Daily 
Sunspot Reports (1958-2000), USSR Station Data (1968-1991), Mount Wil- 
son Individual Sunspot Data (1917- 1985), Kodaikanal Individual Sunspot Data 
(1906-1987) and Greenwich/Debrecen Observations (1874-2007), without re- 
training the existing neural network. All of these datasets are available through 
either the National Geophysical Data Centre (http://www.ngdc.noaa.gov) or the 
UK Solar System Data Centre (http://www.ukssdc.ac.uk). 

The constants of proportionality in the Gnevyshev-Waldmeier rule (Aq = 
DqwT) derived in this study (Section 4) have been found to be greater than 
previous estimates. Petrovay and van Driel-Gesztelyi (1997), who used informa- 
tion on recurrent sunspot groups extracted from the Debrecen Photohcliographic 
Results, also found a value of Dqw that was larger (10.89 MSH day -1 ) than 
previous estimates. In a study of sunspot group lifetimes, Zuccarello (1993) pre- 
sented results which show a change in the rotation rate between short-lived and 
long-lived (11-day old) sunspot groups, pointing to a difference in "aggregation 
capability" of the group within the convection zone. The present results suggest 
that there may be some additional physics present within longer lived groups in 
the GPR(linkcd, reliable) dataset and thereby imply that this matter warrants 
further investigation. 

Evidence has been found for an increase in the lifetime of recurrent sunspot 
groups by a factor of about 1.4 between 1915 and 1940 (Section 5), which is in 
excellent agreement with the result obtained by Blanter et al. (2006). Indeed, 
this increase in lifetime actually occurs over a longer period (1915 - 1950) than 
previously thought and there is also provisional evidence for a slight decrease in 
lifetime between 1950 and 1965 (see Figure 6). 

Solar changes over periods longer than a few decades are currently of consid- 
erable interest as the solar output is a significant input to climate models (Haigh 
et al, 2005). The Gleissberg cycle is detected in sunspot number and has a mea- 
sured period of approximately 80-120 years (Gleissberg, 1967; Hoyt, Schatten, 
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and Nesmes-Ribes, 1994; Garcia and Mouradian, 1998). Garcia and Mouradian 
(1998) estimated the most recent Gleissberg cycle minima to be around 1900 
and the maxima around 1965. The results presented here suggest that, if the 
lifetimes of recurrent sunspot groups are a good proxy for the Gleissberg cycle, 
the maxima occurred some years before 1965 during the 1950s. Further study of 
this topic would be facilitated by applying the trained neural network to sunspot 
data in the interval 1977-2009. 

The analysis of historical sunspot observations using automated methods, 
such as the neural network method presented in this paper, is required to esti- 
mate the (true) total number of sunspots emerging on the solar surface during the 
solar cycle. Numerical simulations of the centennial evolution of magnetic flux on 
the solar surface scale the sunspot emergence rate to the total sunspot number, 
measured without considering recurrent sunspots and their variable lifetimes in 
great detail (Wang, Lean, and Sheeley, 2002). A sunspot number that corrects 
for the recurrence of long-lived sunspots may improve the numerical predictions 
by providing a better description of emergence. The numerical simulations over 
centennial scales could be compared to the most recent estimates of long-term 
variation of the open magnetic flux on the solar surface (Rouillard, Lockwood, 
and Finch, 2007). Neural networks could also be optimised to detect the location 
of the umbra and penumbra of sunspots and to estimate the total magnetic flux 
inside these large-scale active regions. Such a rough estimate of the magnetic 
flux on the photosphere could then be used to estimate the magnetic topology 
of the corona using simplified potential field source surface models. 
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